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Abstract 

We consider the power of "linear reconstruction attacks" in statistical data privacy, showing that 
they can be applied to a much wider range of settings than previously understood. Linear attacks have 
been studied before [3, 6, 11, 1, 14] but have so far been applied only in settings with releases that are 
"obviously" linear. 

Consider a database curator who manages a database of sensitive information but wants to release 
statistics about how a sensitive attribute (say, disease) in the database relates to some nonsensitive at- 
tributes (e.g., postal code, age, gender, etc). This setting is widely considered in the literature, partly 
since it arises with medical data. Specifically, we show one can mount linear reconstruction attacks 
based on any release that gives: 

1 . the fraction of records that satisfy a given non-degenerate boolean function. Such releases include 
contingency tables (previously studied by Kasiviswanathan et al. [11]) as well as more complex 
outputs like the error rate of classifiers such as decision trees; 

2. any one of a large class of A/-estimators (that is, the output of empirical risk minimization algo- 
rithms), including the standard estimators for linear and logistic regression. 

We make two contributions; first, we show how these types of releases can be transformed into a linear 
format, making them amenable to existing polynomial-time reconstruction algorithms. This is already 
perhaps surprising, since many of the above releases (like ilZ-estimators) are obtained by solving highly 
nonlinear formulations. 

Second, we show how to analyze the resulting attacks under various distributional assumptions on 
the data. Specifically, we consider a setting in which the same statistic (either 1 or 2 above) is released 
about how the sensitive attribute relates to all subsets of size k (out of a total of d) nonsensitive boolean 
attributes. 

1 Introduction 

The goal of private data analysis is to provide global, statistical properties of a database of sensitive infor- 
mation while protecting the privacy of the individuals whose records the database contains. There is a vast 
body of work on this problem in statistics and computer science. 

Until a few years ago, most schemes proposed in the literature lacked rigor: typically, the schemes had 
either no formal privacy guarantees or ensured security only against a specific suite of attacks. The seminal 
results of Dinur and Nissim [3] and Dinur, Dwork and Nissim [2] initiated a rigorous study of the tradeoff 
between privacy and utility. The notion of differential privacy (Dwork, McSherry, Nissim and Smith [5], 
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Dwork [4]) that emerged from this line of work provides rigorous guarantees even in the presence of a 
malicious adversary with access to arbitrary side information. Differential privacy requires, roughly, that 
any single individual's data have little effect on the outcome of the analysis. Recently, many techniques 
have been developed for designing differentially private algorithms. A typical objective is to release as 
accurate an approximation as possible to some high-dimensional function / evaluated on the database D. 

A complementary line of work seeks to establish lower bounds on how much distortion is necessary for 
particular functions /. Some of these bounds apply only to specific notions of privacy {e.g., lower bounds 
for differential privacy [5, 8, 9, 13, 1]). A second class of bounds rules out any reasonable notion of privacy 
by giving algorithms to reconstruct almost all of the data D given sufficiently accurate approximations to 
f{D) [3, 6, 7, 11, 1]. We refer to the latter results as reconstruction attacks. 

We consider reconstruction attacks against attribute privacy: consider a curator who manages a database 
of sensitive information but wants to release statistics about how a sensitive attribute (say, disease) in the 
database relates to some nonsensitive attributes (e.g., postal code, age, gender, etc). This setting is widely 
considered in the applied data privacy literature, partly since it arises with medical and retail data. 

For concreteness, consider a database D that contains, for each individual i, a sensitive attribute si £ 
{0, 1} as well as some other information Ui G which is assumed to be known to the attacker. The ith 
record is thus {Ui,Si). We denote the entire database D = {U\s) where U G s G {O,!}*^, and | 

denote concatenation. Given some released information y, the attacker constructs an estimate s that she 
hopes is close to s. We measure the attack's success in terms of the Hamming distance (i//(s, s). A scheme 
is not attribute private if an attacker can consistently get an estimate that is within distance o{n). Formally: 

Definition 1 (Failure of Attribute Privacy). ' A (randomized) mechanism M. : is said to 

allow (a, /?) attribute reconstruction if there exists a setting of the nonsensitive attributes U G M"^'^ and an 
algorithm (adversary) A : W^'^ x such that for every s G {0, 1}", 

Pr [^(C/,y) =s : dj^(s,s) <q] > 1-/3. 

Asymptotically, we say that a mechanism is attribute nonprivate if there is an infinite sequence of n for 
which A4 allows (o(l), o(l))-reconstruction. Here d = d{n) is a function of n. We say the attack A is 
efficient if it runs in time poly(n, d). 

Instead of simply showing that a setting of U exists, we will normally aim to show that reconstruction 
is possible with high probability when U is chosen from one of a class of natural distributions. 

Linear Reconstruction Attacks. In this paper, we consider the power of linear reconstruction attacks. 
Given the released information y, the attacker constructs a system of approximate linear equalities, namely 
a matrix A and vector z such that ^ds z and attempts to solve for s. A typical algorithmic approach is to 
find s which minimizes some norm (£2 or £1) of the error (As — z). Minimizing the I2 error is known as 
least squares decoding and minimizing the £1 error is known as LP decoding. One sometimes also considers 
algorithms that exhaustively seaixh over all 2" possible choices for s (as in [3, 14]). 

Such attacks were first considered in the context of data privacy by Dinur and Nissim [3]. They showed 
that any mechanism which answers (or allows the user to compute) il{n log n) random inner product queries 
with {0, 1} vectors on a database s G {0, 1}" with o{y/n) noise per query is not private. That is, they assume 
that the mechanism releases y = + e, where ^ is a random matrix in {0, i}f^("iog")x" and e IS a noise 

'This definition generalizes blatant non-privacy (Dinur and Nissim [3]) and first appeared in [11]. The order of the qualifiers 
here has been changed, correcting an error pointed out by Graham Cormode. 
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vector with ||e||oo = o{^/n). Their attack was subsequently extended to use a linear number of queries [6], 
allow a small fraction of answers to be arbitrarily distorted [6], and run significandy more quickly [7]. 

In their simplest form, such inner product queries require the adversary to be able to "name rows", that 
is, specify a coefficient for each component of the vector s. Thus, the lower bound does not seem to apply to 
any functionality that is symmetric in the rows of the database (such as, for example, "counting queries")? 

Natural Queries. This paper focuses on linear attacks mounted based on the release of natural, symmetric 
statistics. A first attack along these lines appeared in a previous work of ours (together with J. Ullman) [11] 
in which we analyzed the release of marginal tables (also called contingency tables). Specifically, in [11], 
we showed that any mechanism which releases the marginal distributions of all subsets of A; + 1^ attributes 
with o{y/n) noise per entry is attribute non-private if d = f](n^/*^^).'* These noise bounds were improved 
in [1], which presented an attack that can tolerate a constant fraction of entries with arbitrarily high noise, 
as long as the remaining positions have o{^/n) noise. We generalize both these results in this paper. 

Recently, linear attacks were also considered based on range query releases [14] (which, again, are 
natural, linear queries). 

Our Results. We greatly expand the applicability of linear attacks in "natural" settings. Specifically, we 
show one can mount linear reconstruction attacks based on any release that gives: 

1. the fraction of records that satisfy a given non-degenerate boolean function (a boolean function over 
p variables is non-degenerate if its multilinear- representation has degree exactly p). Such releases 
include contingency tables as well as more complex outputs like the error rate of certain classifiers 
such as decision trees; or 

2. the M-estimator associated with a differentiable loss function. M-estimators are a broad class of 
estimators which are obtained by minimizing sums of functions of the records (they ai^e also called 
empirical risk minimization estimators). M-estimators include the standard estimators for linear and 
logistic regression (both these estimators are associated with differentiable loss functions). See Sec- 
tion 4 for definitions. 

Our contributions are two-fold. First, we show how these types of releases can be transformed into 
a (noisy) linear release problem, making them amenable to linear reconstruction attacks. This is already 
perhaps surprising, since many of the above statistics (Uke M-estimators) are obtained by solving highly 
nonlinear formulations. After performing this transformation, we can apply polynomial-time methods (like 
least squares or LP decoding) on this linear release problem to estimate the sensitive data. 

Second, we show how to analyze these attacks under various distributional assumptions on the data. 
This gives lower bounds on the noise needed to release these statistics attribute privately. Specifically, we 
consider a setting in which the same statistic (either 1 or 2 above) is released about how the sensitive attribute 
relates to all subsets of (constant) size k (out of a total of d) nonsensitive boolean attributes. For a subset 
J C [d] of size k, let U\j denote the submatrix of U consisting of the columns in J. 

^It was pointed out in [2] that in databases with more than one entry per row, random inner product queries on the sensitive 
attribute vector s can be simulated via hashing: for example, the adversary could ask for the sum the function H{Ui) ■ Si over the 
whole database, where H : {0, 1}'*^'^ — >■ {0, 1} is an appropriate hash function. This is a symmetric statistic, but it is unlikely to 
come up in a typical statistical publication. 

''For asymptotic statements, k is considered constant in this paper, as in previous works [11, 1]. 

''The notation hides polylogarithmic factors. 
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The (^) entries for a statistic ai^e obtained by evaluating the same statistic on (C/|j|s) for all sets J of 
size k. Specifically: 

• Consider a mechanism which releases, for every set J of size k, the fraction of records (rows) in (C/| j|s) 
satisfying a non-degenerate boolean function over A; + 1 variables. We show that if the mechanism adds 
0(1 / y/n) noise per entry and if d = then it is attribute non-private. 

• Consider a mechanism which releases, for every set J of size k, a particular M-estimator evaluated over 
(C/|j|s). We show that if the mechanism adds o(l/(Ay^)) noise per entry and if d = r2(n), then it is 
attribute non-private. Here, A is the Lipschitz constant of the loss function gradient. The loss function 
also needs to satisfy a mild variance condition. For the case of linear and logistic regression estimators, 
A = 0(1) for bounded data, and so the noise bound is o{l/^/n). 

The statements above are based on the least squares attack. For most settings, we show the LP decoding 
attack can also handle a constant fraction of entries with arbitrarily high noise (the exception is the setting 
of general M-estimators). 

Techniques for Deriving the Attacks. Casting the releases as a system of linear- equations requires two 
simple insights which we hope will be broadly useful. First, we note that when s = (si, . . . , s„) is boolean, 
then any release which allows us to derive an equation which involves a sum over database records can in 
fact be made linear in s. Specifically, suppose we know that gi{si) = t, where t is a real number and gi 
is an ai^bitrary real-valued function that could depend on the index i, the public record Ui, and any released 
information. We can rewrite gi{si) as gi{0) + Si{gi{l) — gi{0)); the constraint gi{si) = t can then be 
written as Si ■ (ffi(l) — <7j(0)) = t — X^jffi(O), which is affine in s. This allows us to derive linear 
constraints from a variety of not-obviously-linear releases; for example, it allows us to get linear attacks 
based on the en^or rate of a given binary classifier (see Section 3). 

The second observation is that for many nonlinear optimization problems, every optimal solution must 
satisfy constraints that are, in fact, sums over data records. For example, for M-estimators associated with 
differentiable loss functions, the gradient at the solution 9 must equal 0, leading to an equation of the form 

d l{9] {Ui, Si)) = 0. This can be made linear in s using the first technique. We bound the effect of 
any noise added to the entries of M-estimator {9) via the Lipschitz properties of the gradient of the loss 
function I. 

Techniques for Analyzing the Attacks. The techniques just mentioned give rise to a variety of linear 
reconstruction attacks, depending on the type of released statistics. We can provide theoretical guarantees 
on the performance of these attacks in some settings, for example when the same statistic is released about 
many subsets of the data {e.g., all sets of a given size A;) and when the data records themselves are drawn i.i.d. 
from some underlying distribution. The main technique here is to analyze the geometry of the constraint 
matrix A that arises in the attack. For the case of non-degenerate boolean functions, we do so by relating the 
constraint matrix to a row-wise product of a matrix with i.i.d. entries (referred to as a random row product 
matrix, see Section 3.2.1), which was recently analyzed by Rudelson [17] (see also [11]). The results of [17] 
showed that the least singular value of a random row product matrix is asymptotically the same as that of a 
matrix of same dimensions with i.i.d. entries, and a random row product matrix is a Euclidean section. Our 
results show that a much broader class of matrices with correlated rows satisfy these properties. 

Organization. In Section 2, we introduce some notation and review the least squares and LP decod- 
ing techniques for solving noisy linear systems. In Section 3, we present our results on evaluating non- 
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degenerate boolean functions. As mentioned eaiiier, we first reduce tlie release problem to a linear recon- 
struction problem (Section 3.1), and then the attacks works by using either least squares or LP decoding 
techniques. The analysis requires analyzing spectral and geometric properties of the constraint matrix that 
arises in these attack which we do in Section 3.2. In Section 4, we present our results on releasing M- 
estimators associated with differentiable loss functions. For clarity, we first discuss the attacks for the 
special cases of linear and logistic regression estimators (in Section 4.1), and then discuss the attacks for the 
general case (in Section 4.2). 

2 Preliminaries 

Notation. We use [n] to denote the set {1, . . . , n}. •) measures the Hamming distance. Vectors used 
in the paper are by default column vectors and are denoted by boldface letters. For a vector v, denotes 
its transpose, ||v|| denotes its Euclidean norm, ||v||i denotes its li norm, and ||v||oo denotes its £oo norm. 
For two vectors vi and V2, (vi, V2) denotes the inner product of vi and V2. We use (a).„ to denote a vector 
of length of n with all entries equal to a. For a matrix M, \\M\\ denotes the operator norm and denotes 
the ith row of M. Random matrices are denoted by boldface capitalized letters. We use diag{ai , . . . , a„) to 
denote an n x n diagonal matrix with entries oi, . . . , a„ along the main diagonal. The notation vert(-, ...,•) 
denotes vertical concatenation of the argument matrices. 

Let M be an X ?i real matrix with N > n. The singular values aj (M) are the eigenvalues of V M 
arranged in non-increasing order. Of particular importance in this paper is the least singular value it„(M) = 
iiifz:||z|i=i M^zll. The unit sphere in n dimensions centered at origin is denoted by S*""^ = {z : ||z|| = 1}. 

Our analysis uses random matrices, and we add a subscript of r to differentiate a random matrix from a 
non-random matrix. As mentioned earlier, /c is a constant in this paper and we often omit dependence on k 
in our results. 

2.1 Background on Noisy Linear Systems. 

Noisy linear systems arise in a wide variety of statistical and signal-processing contexts. Suppose we are 
given a matrix A and vector z such that z. = As + e, where e is assumed to be "small" (in a sense defined 
below). A natural approach to estimating s is to output s = ai^gming ||yls — z||p for some p > 1. We will 
consider p = 1 and 2; we summarize the assumptions and guarantees for each method below. When it is 
known that s G {0, 1}", the attacker can then round the entries of s to the nearer of {0, 1} to improve the 
estimate. 

In the sequel, we call a vector z G (a, b)-small if at least I — a fraction of its entries have magnitude 
less than b. In other words, for some set 5, |5| > (1 — a) • m, it is the case that \zi\ < b for all i e 5. 

£2 error minimization ("least squares"). Widely used in regression problems, the least squares method 
guarantees a good approximation to s when the EucUdean norm ||e|| is small and A has no small eigenvalues. 
It was first used in data privacy by Dwork and Yekhanin [7]. For completeness, we present the entire analysis 
of this attack (in a general setting) here. 

Let A = PT,Q^ be the singular value decomposition of A. Here, P is an orthogonal m x m matrix, S 
is a diagonal m x n matrix, and Q is an orthogonal n x n matrix. Let Onx{m-n) be an n x (m — n) matrix 
with all entries zero. Define 

Tinv = {diag{an{A)~^ , . . . ,ai{A)~^)\Onx(^rn-n))- 
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In other words, s is obtained by rounding Ai^^z to closest of 0, 1. 

Now the claim is that s is a good reconstruction of s. The idea behind the analysis is that Ai^z = 
s + AinvG. Now (as P and Q are orthogonal matrices, they don't affect the norms). 



Let us assume that (7n{A) = a. If (the absolute value of) all the entries in e are less than /3 then 
||e|| = /3^/rn, and therefore ||^inve|| = {^fi^fm)jo. In particular, this implies that AnvG cannot have 
{Am0^)/a'^ entries with absolute value above 1/2, and therefore the Hamming distance between s and s is 
0(4m/3^)/cr^ (as the adversary only fails to recover those entries of s whose con^esponding Ain^e entries 
are greater than 1/2). The time complexity of the attack is dominated by the cost of computing the singular 
value decomposition of A which takes 0{m'n?) time.^ 

Theorem 2.1. Let A : M" — t- M™ be a full rank linear map with [m > n) such that the least singular 
value of A is a. Then if e is (0, /3) small (that is, if \\e\\oo ^ the vector argmin^ \\As — z\\, rounded to 
{0, 1}", satisfies d^i^, s) < (4m/3^)/cj^. In particular, if a = il:{^/m) and /3 = o{^/n), then s agrees with 
al — o{l) fraction ofs. The attack runs in 0(m?i^) time. 

ii error minimization ("LP decoding"). In the context of privacy, the "LP decoding" approach was first 
used by Dwork et al. [6]. (The name stems from the fact that the minimization problem can be cast as a 
linear program.) The LP attack is slower than the least squares attack but can handle considerably more 
complex error patterns at the cost of a stronger assumption on A. Recently, De [1] gave a simple analysis of 
this attack based on the geometry of the operator A. We need the following definition of Euclidean section. 

Definition 2. A linear operator A : M" — )■ M™ is said to be a a-Euclidean section if for all s in M", 



Note that by Cauchy-Schwarz, the first inequality, ^/rn\\As\\ > ||As||i, always holds. We remark that when 
we say A is Euclidean, we simply mean that there is some constant a > such that A is a-Euclidean. 

The following theorem gives a sufficient condition under which LP decoding gives a good approximation 
to s. The time bound here was derived from the LP algorithm of Vaidya [21], which uses O ( ( ( A'^i + ^^^^2 ) -^1 + 
(A'^i + A'^2)^'^-^2)-^3) arithmetic operations where A^i is the number of constraints, A''2 is the number of 
variables, and is a bound on the number of bits used to describe the entries of A. In our setting, the LP 
has n variables, m constraints, and could be upper bounded by mn, and therefore the LP could be solved 
in 0{m?n? + mp-^v?) time. 

Theorem 2.2 (From [1]). Let A : — t- M™ be a full rank linear map (m > n) such that the least singular 
value of A is a. Further, let Abe a a-Euclidean section. Then there exists aj = 7(a) such that ife is (7, /3) 
small, then any solution s = argmin^ \\As — z\\i, rounded to {0, 1}", satisfies dj|/(s, s) < 0{l3y/mn/a) 
where the constant inside the O(-) notation depends on a. In particular, if a = 0,{y/m) and /? = o{^/n), 
then this attack recovers 1 — oil) fraction ofs. The attack runs in 0{rin?"n? + m?'^'n?) time. 

'SVD decomposition of A*'i x 7V2 sized matrix can be done in 0{NiN2) time. 




e 



■\Ara||^s|| > ||As||i > a-v/m||^s 
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3 Releasing Evaluations of Non-Degenerate Boolean Functions 



In this section, we analyze privacy lower bounds for releasing evaluations of non-degenerate boolean func- 
tions. We use the following standai^d definition of representing polynomial (see Appendix A for a back- 
ground about representing boolean functions as multilinear polynomials). 

Definition 3. A polynomial P^^^ over the reals represents a function f over {0, 1}'^'^^ if f{xi, • • • , Xk+i) = 
. . . , Xk+i)for all (xi, . . . , x^+i) G {0, 

Definitions A function f : {0,1}^+'^ {0,1} is non-degenerate iff it can be written as a multilinear 
polynomial of degree k + 1. 

If / is non-degenerate, then it depends on all of its A; + 1 variables. Note that non-degenerate functions 
constitute a large class of functions.^ For example, it includes widely used boolean functions like AND, OR, 
XOR, MAJORITY, and depth k + 1 decision trees [15]. 

Problem Statement. Let / : {0, 1}*^+^ — )• {0, 1} represent the function that we want to evaluate on a 

database D. Let D = (C/|s) G ({0, l})„x(<i+i), where U G ({0, l})nxd and s G {0, 1}". Let U = {6ij), 
i.e., 6ij denotes the (i, j)th entry in U (with 1 < i < n and I < j < d). Let 

J = iji,---,jk) G {i,...,4*^ 

(where = {1, . . . , 4 x • • • x {1,...,4). 

^ V ' 

k times 

Note that J allows repeated entries.^ Let D| j be the submatrix of D restricted to columns indexed by J. For 
a fixed J, define F{D\j) as 

n 

F{D\j) = ^f{Si,j^,- ■ .,dij^^,Si), J = (ji, . . . ,jk)- 
1=1 

Note that F{D\j) is an integer between to n. Let T,f{D) be the vector obtained by computing F on all 
different D\j's: 11/(1)) = {F{D\j)) where J G {1, . . . , d}''. Note that S/(L>) is a vector of length d^. The 
goal is to understand how much noise is needed to attribute privately release Tif{D) (or Tif{D)/n) when / 
is non-degenerate. 

Our Results. We prove the following results using the I2 and li error minimization attacks outlined in 
Section 2.1. 

Theorem 3.1 (Informal Statements). Let f : {0, 1}^+^ {0, 1} be a non-degenerate boolean function. 
Then 

1. any mechanism which for every database D G ({0, l})nx(d+i) withn <^ d^ releases T,f{D) by adding 
o{^/n) (or releases Tjf{D)/nby adding o{l/ y/n)) noise to each entry is attribute non-private. The attack 
that achieves this non-privacy violation runs in 0{d^v?) time. 

*A simple counting argument sliows tliat among the 2^ ^ boolean functions over A; + 1 variables, 2^ ^ ~ (^2'« ) ^® non- 
degenerate. 

^We allow repeated entries for convenience of notation. Our results also hold if we use the more natural J C [d\, \ J\ — k. 



7 



2. there exists a constant 7 > such that any mechanism which for every database D G ({0, l})nx(d+i) 
with n <^ releases T,f{D) by adding o{y/n) (or releases T,f{D)/n by adding o{l/y/n)) noise to 
at most 1 — 7 fraction of the entries is attribute non-private. The attack that achieves this non-privacy 
violation runs in 0{d^^'n? + d^'^'^n^) time. 

For convenience of notation, in this section, we work mostly with the transpose of U . Let T = f7^. So 
T is a (i X n matrix. 

3.1 Reducing to a Linear Reconstruction Problem. 

In this section, we reduce the problem of releasing T,f{D) for a database D into a linear reconstruction 
problem. First, we define a simple decomposition of boolean functions. Consider a non-degenerate boolean 
function / : {0, {0, 1}. Now there exists two function /o : {0, 1}^ {0, 1} and /i : {0, 1}'' 
{0, 1} such that 

= /o(5i,...,4)(l-<5fc+i) + /i(5i,...,<5fc)5fc+i V(5i,...,4+i) G {0,1}'=+^ 

This can be re-expressed as 

f{5i, . . .,5k+i) = foih, ■■■,Sk) + ...,6k)- foih, ■ ■ • 

Define /2(5i, ...,6k) = fi{Si,.. .,6k) - fo{6i, ■ ■ ■,6k). Therefore, 

f{6i, ... , 6k+i) = /o((Ji, ...,6k) + 12(61, 6k)6k+i- (1) 

Note that /2 is a function from {0,1}'^ — )■ { — 1,0,1}. Since both /o and /i ai^e both boolean functions 
and can be represented as multilinear polynomials over the variables 61, ... ,6k, therefore /2 also could be 
represented as a multilinear polynomial over the variables 61, ... ,6k. Since / is represented by a multilinear 
polynomial of degree k + 1, therefore, the multilinear polynomial representing /2 has degree k (if it has any 
lower degree, then / could be represented as multilinear- polynomial of degree strictly less than A; + 1, which 
is a contradiction). To aid our construction, we need to define a particular function of matrices. 

Definition 5 (Row Function Matrix). Let h be a function from {0,1}'' — > {—1,0,1}. LetTf^i^ = (5|^^),T'(2) = 
(6^'^^), . . . ,T(^k) — {^Tj) ^ matrices with {0, 1} entries and dimensions d x n. Define a row function 
matrix (of dimension x n) n/j(r(i), . . . ,T(^k)) as follows. Any row of this matrix will correspond to a 
sequence 

J ={ji,j2,---,jk) G {l,...,d}'' 

of k numbers, so the entries of n/i(T(-]^), . . . , T^^k)) ^'^^ denoted^ by TTj^a, where a G {1, . . . , n}. For 
J = (ji, j2) • • • ,jk) die entries of the matrix n/j(T(^), . . . , T^k)) ^^^^ defined by the relation 

, e(2) Ak) X 

The row product matrices from [11] (see Definition 6) is a particular example of this construction where 
the function h{6i,62, . . . ,6k) = 6i - 62 - . . .■ 6k„ which implies that Ilh{T(^i), . . . , T{^k)) = ^(i) • • • T{k^, 
where is the row-product operator from Definition 6. 

'^The definition assumes a certain order of the rows of the matrix nh(r(i) , . . . , T^k))- This order, however, is not important, for 
our analysis. Note that changing the relative positions of rows of a matrix doesn't affect its eigenvalues and singular values. 
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Let D = {U\s) be a database, and let T = . Let U = (Sij). Consider any fixed J = (ji, . . . , jk) S 
{1, . . . , d}^. Now for this J, there exists an entry in T,f{D) equaling Yll=i fi^iji^ • • • > ^i,jk^ Now con- 
sider the matrices H/^ (T, . . . , T) and H/g (T, . . . , T) . Consider the rows in H/^ (T, . . . , T) and (T, . . . , T) 
corresponding to this above J. Let this be the ^th row in these matrices. Then the Ith row of the matrix 
Hfg (T, . . . , T) has n entries equaling /o(<^t,ji , • • • , (^jjj. ) and the Ith row of the matrix II jr, (T, . . . , T) has n 
entries equaling f2{^i,ji , • • • , for i = 1, . . . , n. Since 

it follows that 

n n 
i=l i=l 

= {Uf,{T,...,T)i,ln) + {Uf,{T,...,T)i,s), 

where H/g (T, . . . , T) ; and H/^ (T, . . . , T) ; denote the rth row of matrices H/,, (T, . . . , T) and (T, . . . , T) 
respectively and 1„ denotes the vector (1)". Now define a vector Hf{D) whose Zth element {1 < I < d^) is 

Hf(D)i = {Ilf,{T,...,T)uln) + {Iif,{T,...,T)us). (2) 

The length of vector Hf{D) is d^. The above arguments show that all the entries of Hf{D) are contained in 
the vector T,f{D). Since every row in these Tif{D) con^espond to some J, it also follows that all the entries 
of Sj(L') are contained in the vector Hf{D), implying the following claim. 

Claim 1. ^f{D) = Hf{D)^ 

Setting up the Least Squares Attack. The privacy mechanism releases a noisy approximation to 'Ef{D). 
Let y = (yi , . . . , y^k ) be this noisy approximation. The adversary tries to reconstruct an approximation of 
s from y. Let bj- = {Ilf^{T, . . . ,T)i, 1„), and bj = {bj^, . . . ,bf^f,). Given y, the adversary solves the 
following linear reconstruction problem: 

y = b^ + n^,(r,...,r)s. 0) 

In the setting of attribute non-privacy the adversary knows T, and therefore can compute IT (T, . . . , T) 
and Ilfg{T, ■ ■ ■ ,T) (hence, hj). The goal of the adversary is to compute a large fraction of s given y. The 
definition of iterated logarithm (logj-g-j) is given in Definition 8. In the below analysis, we use a random 
matrix T and the least singular value lower bound on a random row function matrix from Theorem 3.9. We 
use boldface letters to denote random matrices. 

Theorem 3.2 (Part 1, Theorem 3.1). Let f : {0, 1}^+^ {0, 1} be a non-degenerate boolean function and 
n < cd'^ / log(g) dfor a constant a depending only on k, q. Any privacy mechanism which for every database 
D G ({0, l})nx(d+i) releases T,f{D) by adding o{y/n) (or releases Tjf{D)/n by adding o{l/^/n)) noise to 
each entry is attribute non-private. The attack that achieves this non-privacy violation runs in 0{d^n?') time. 

Proof. Consider the least squares attack outlined in Theorem 2. 1 on Equation (3). Let T be a random matrix 
of dimension d x n with independent Bernoulli entries taking values and 1 with probability 1/2, and let 
database D = (T^|s) for some s G {0, 1}". For analyzing the attack in Theorem 2.1, we need a lower 
bound on the least singular value of Hj^iT, . . . , T). The following claim follows from Theorem 3.9. Note 
that /2 is a function over k variables. 

'Under proper ordering of both the vectors. 
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Claim 2. For function f2 defined above and n < cd^ / log(g) d (where c is the constant from Theorem 3.8), 
the matrix IIjj (T, . . . , T) satisfies 



Pr 



a„(%(T,...,T)) <C'\/d^ < ci exp (-Cad) . 



Proof. Apply Theorem 3.9 with function h = f2. 



□ 



Claim 2 shows that with exponentially high probability cj„(nj2 (T, . . . , T)) = Q{vdJ'). Invoking The- 
orem 3.9 with m = d^ and (3 = o{\/n) shows that with exponentially high probability the adversary fails to 
recover only o(n) entries of s. Noise bound of o(-v/n) for releasing Tjf{D) translates into a noise bound of 
o{l/y/n) for releasing T,f{D)/n. □ 



Setting up the LP Decoding Attack. The LP decoding attack solves a slightly different reconstruction 
problem than Equation (3). The reason is because Ilf^ (T, . . . , T) is not a Euclidean section'" (a property 
needed for applying Theorem 2.2). However, we show that a related reconstruction problem has all the 
properties needed for the LP decoding attack. The analysis goes via matrices with {1,-1} entries which 
have the desired properties. We establish the Euclidean section property in Appendix B. 

Let D = {U\s) be a database, and let T = U'^ . Let V = 2T — Idxn where l^xn is a d x n matrix of 
all I's. Define g : {-1, l}^+i {-1, 1} as 

.(^1,... = 2/ (i±^,...,i±^)-L (4) 

We can decompose g as 

g{(f>i, . . .,(f>k+i) = 9i{4>i,-- -Ak) + — 9^i{<Pi, ■ ■ ■ ,4>k), 

where gi : {-1, l}'^ {-1, 1} and c/_i : {-1, l}'^ {-1, 1}. Using the notation, gi = . . . , 

mdg-i = . . . we get 

g{(f)i, (j)k+i) = {gi + 5-i)/2 + 0fc+i(5i - 5'-i)/2- 

Define 52 = £^2((/>i, ■■■,(t>k) = {91 - 5-i)/2 and 5-3 = gs{(j)i,..., (pk) = {gi + g-i)/2. Let us denote 
5i = {1 + (pi) /2. Using the decomposition of / from Equation (1), 

J. ^ . , i±^^j = fiSi,.. .,6k+i) = fo{Si,.. .,5k) + f2iSi, ■ ■ ■,Sk)Sk+i- 

Using the notation, /o = fo{5i, . . . ,5k) and /a = /2(<^i, • • • , '^fc)> and substituting the decomposition of / 
and g into Equation (4) gives: 

93 + 92<Pk+i = 2(/o + /24+1) -l = fo + f25k+i = (1/2) (53 + 520fc+i + 1). (5) 

'"if we take a matrix T of dimension d x n with independent Bernoulli entries taking values and 1 with probability 1/2, the 
resulting matrix 11/2 (T, . . . , T) is not a Euclidean section. This is because the matrix T is non-centered (expectation of each entry 
in the matrix is 1/2) which makes the Ull/j (T, . . . , T) || to be ~ d'" instead of VdF. 
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Claim 3. Let D = {U\s), T = , and V = 2T - U^n- Then 

^f{D), = ^ {{Uy, {V,..., V)„ In) + {Hg, {V,..., V)i, 2s - 1„) + n) . 
Proof. Note that from the eailier established decomposition 

^f{D)i = (nj„(r,...,r)„i„) + (ny,(r,...,r)i,s). 

The reminder of the proof follows by using Equation (5) along with the definition of row-function matrices 
(Definition 5). □ 

Define 

{ng,{V,...,V)i,ln) , n {Ug,{V,...,V)i,ln) 



2 2 2 

Then S/(L>)i = qg^ + {Ug^{V, . . . ,V)i,s). Let = (g^,, . . . , g^^J. We get 

^f{D) = qg + Ug,iV,...,V)s. (6) 

Let y be the noisy approximation to Ilf(-D) released by the privacy mechanism. Given y, the linear 
program that the adversary solves is: 

argmin^ ||y - - ligiiV, . . ■ ,V)s\\^ . (7) 

The following theorem analyzes this attack using a random matrix V. 

Theorem 3.3 (Part 2, Theorem 3.1). Let f : {0, Ij'^+i {0, 1} be a non-degenerate boolean function and 
let n < cd^ / log(g-) dfor a constant c depending only on k, q. Then there exists a constant 7 = "f{k, q) > 
such that any mechanism which for every database D E ({0, l})nx(d+i) releases by adding o{^/n) 

(or releases T,f{D)/n by adding o{l/^/n)) noise to at most 1 — ^fraction of the entries is attribute non- 
private. The attack that achieves this non-privacy violation runs in 0{d'^^n^ + d'^'^^Ti?) time. 

Proof. The proof uses the LP decoding attack outlined in Theorem 2.2 on Equation (7). Let T be a ran- 
dom matrix of dimension d x n with independent Bernoulli entries taking values and 1 with probabil- 
ity 1/2, and let database D = (T^|s) for some s G {0,1}" . Let V = 2T - l^xn- To use The- 
orem 2.2, we need to (i) establish that Hg^ (V, . . . , V) is a Euclidean section and (ii) establish a lower 
bound on its least singular value. Since g2 : { — 1, 1}*^ — { — 1, 0, 1} has a representation as a multilinear 
polynomial of degree k, Theorem B.4 shows that Ilg,.,{'V, . . . , V) is with exponentially high probability 
a Euclidean section. Repeating an analysis similar to Theorem 3.9 shows that the least singular value of 
Ilg^CV, . . . , V) is with exponentially high probability at least Hence, with exponentially high 

probability both of the following statements hold simultaneously: (i) Ilg^CV, . . . , V) is a Euclidean section 
and (ii) aniUg,(V, . . . , V)) = n{Vd^). 

Invoking Theorem 2.2 with /3 = o(^/n), a = ^/n, b = d^, and <j = ^{VdF) shows that with exponen- 
tially high probability the adversary fails to recover only o(n) entries of s. This shows that the mechanism 
is attribute non-private. 

In the running time analysis of Theorem 2.2, m gets replaced by (i^, and by d''n (as the input matrix 
can be represented using 0{d^n) bits). Noise bound of o{y/n) for releasing Tif{D) translates into a noise 
bound of o(l/-y/n) for releasing Sj(D)/n. □ 
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3.2 Spectral and Geometric Properties of Random Row Function Matrices. 

Analysis of our reconstruction attacks rely on spectral and geometric properties of random row function 
matrices that we discuss in this section. Rudelson [17] and Kasiviswanathan et al. [11] analyzed certain 
spectral and geometric properties of a certain class of coiTclated matrices that they refen^ed to as row prod- 
uct matrices (or conjunction matrices). Our analysis builds upon these results. We first summarize some 
important definitions and useful results from [17, 1 1] in Section 3.2.1, and establish our least singular value 
bound in Section 3.2.2. The Euclidean section property is established in Appendix B. 



3.2.1 Spectral and Geometric Properties of Random Row Product Matrices. 

For two matrices with the same number of columns we define the row product as a matrix whose rows 
consist of entry-wise product of the rows of original matrices. 

Definition 6 (Row Product Matrix). The entry-wise product of vectors p,q ^ M" is the vector in p g G M" 
with entries {pQq)i = Pi ■ Qi- ^^(i) is an Nixn matrix, and T(^2) i^ ^'^ -^2 x n- matrix, denote by T^^-j T(^2) 
an N1N2 X n matrix, whose rows are entry-wise products of the rows o/T^i) and T(^2) '■ (^(1) © ^(2))j,fc = 
Ti - T2f,, where (T(i) T(^2))j,k^ Ti. , T2^ denote rows of the corresponding matrices}^ 

Rudelson [17] showed that if we take entry-wise product of k independent random matrices of dimension 
dx n, then the largest and the least singular values of the resulting row product matrix (which of dimension 

X n) is asymptotically the same order as that of a d'^ x n matrix with i.i.d. entries. To formulate this 
result formally, we introduce a class of uniformly bounded random variables, whose variances are uniformly 
bounded below. 

Definition 7 (r -random variable and matrix). Let r > 0. We will call a random variable ^ a r-random 
variable if |^| < 1 a.s., E[^] = 0, and ]E[.^^] > r^. A matrix M is called a r-random matrix if all its entries 
are independent r-random. 

We would also need the notion of iterated logarithm (log^gp that is defined as: 

Definition 8. For g G N, define the function log(q) : (0, 00) — )■ M induction. 

1. log(]^) t = max(log t, 1); 

2. log(g+i)t = log(i)(log(^)t). 

We are now ready to state the main result from [17] that establishes a lower bound on the £1 norm of 
(T(i) • • • T(;j))x where each T(j) is an independent r-random matrix and x is a unit vector. 

Theorem 3.4 ( [17]). Let k, q, n, d be natural numbers. Assume that n < cd^ / log(g) d. Let T(i) , . . . , T(;.) 
be k matrices with independent r-random entries and dimensions dxn. Then the k-times entry-wise product 
T^{i) T(2) • • • "^{k) is a d^ X n matrix satisfying 



Pr 



3x G ||(T(i) • • • T(fc))x||i < Cid'^j < ciexp(-C2d). 



The constants c, Ci , Ci , C2 depends only on k and q. 

One of the main ingredients in proving the above theorem is following fact about the norm of row 
product matrices. 



"when A'^i — N2, the row product matrix is also called the Hadamard product matrix. 
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Theorem 3.5 ( [17]). Let T be a matrix with independent r-random entries and dimension d x n. Then the 
k-times entry-wise product T Q ■ ■ ■ Q T is a x n matrix satisfying 



Pr 



|T0---0T|| > C3 i^+y/^ 



< exp ( — C4ni2fc 



The constants C3 , C4 depends only on k. 



The bound on the norm appearing in the above theorem (asymptotically) matches that for ad'^ xn matrix 
with independent r-random entries (refer [22] for more details). 

3.2.2 Least Singular Value of Random Row Function Matrices. 

We start by proving a simple proposition about functions that can be represented as multilinear polynomials. 
The main step behind the following proposition proof is the following simple fact about multilinear poly- 
nomials. Let P^^^ be a multilinear polynomial representing function h, and let {61, ... , 6k) G {0, 1}^ and 
E {0, 1} then 

. .,6i,...,5k)- . .,6i,...,6k) = h{6i, ...,6i,...,6k)- h{6i, . . . ,6i, . . . ,6k) 

Proposition 3.6. Let h be a function from {0,1}'^ — )• { — 1,0,1} having a representation as a multilin- 
ear polynomial of degree k. Let P^^^ denote this multilinear polynomial. Let {61, ... , 6k), {6[, . . . , (5^) € 
{0, l}'^. For I {1, . . . , k} let 6{I) G {0, 1}^ be the point with coordinates 6j (I) = 6'j if j € /; and 6j (/) = 
6j if j ^ /. Then 

{61 -6[)-... {6k - 6'k) = CH{k) J2 . . . , 

IC[k] 

where l/ch{k) is the coefficient of the monomial corresponding to all k variables in the multilinear repre- 
sentation ofh. 

Proof By definition, we know that for all {61,..., 6k) G {0, 1}'', h{6i, ...,6k) = P'^^\6i, ... , 6k). Since 
P^^^ is a lineal- function of 61, 

F<'''(ii,fe,...,4)--p(»'(i;,i2,...,&) = M«i.fc.---,&)-'!W,fc,....&) 

where -§g^P^^^ denotes the partial derivative of P^^\6i, 62, ... , 6k) with respect to 61. Repeating this for 
the other coordinates, we get 

(-l)l'lM^i • • • , h{I)) = {6i-6[)-...- {6k - ^fc) • (|- • • • ^^^'0 • 

The last term in the right hand side is the coefficient of the polynomial P'^^''\6i, . . . ,6k) corresponding to 
the monomial 61 • . . . • 6k, and we denote it by l/c/i(A;). □ 
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Corollary 3.7 (Corollary to Proposition 3.6). Let T(i), . . . , , T(^.), Tj-'^^, . . . , T'^^-^ be 2k matrices with {0, 1} 
entries and dimensions d x n. For a set I CI [k] denote T(^j^{I) = T(^j^ if j G / and T(^j-^{I) = T'^.-^ if j ^ /. 
Then the following holds, 

(Td) - T[,^) ... (r(fc) - T[,^) = ch{k) (-i)i^in,(r(i)(/), . . .,T^k){i))- 

IC[k] 

Proof. Follows by a simple extension of Proposition 3.6 and using Definitions 5 and 6. □ 
For a random T, the following theorem shows that the i?i-norm of nft (T, . . . , T)x is "big" for all 

Theorem 3.8. Let k, q, n, d be natural numbers. Assume that n < cd^ / log^g) d. Consider a d x n matrix 
T with independent Bernoulli entries taking values and 1 with probability 1 /2. Let h be a function from 
{0, l}'^ —7- { — 1,0, 1} having a representation as a multilinear polynomial of degree k. Then the matrix 
Ilh{T, ... ,T) satisfies 

Pr [3x G 5""^ ||n/,(T, . . . , T)x||i < C'd^] < a exp(-C2d). 

The constants c, C , ci, C2 depend only on k and q. 

Proof. First notice that if an ai x n matrix M' is formed from the a2 x n matrix M by taking a subset of 
rows, then for any x G M", ||Af'x||]^ < ||Afx||-^. 

Let d = 2kd' + I, where < / < 2k. For j = 1, . . . ,k denote by Tj the submatrix of T consisting of 
rows {2d'{j - 1) + 1), ... , {2d'{j - 1) + d'), and by the submatrix consisting or rows {2d'{j -l) + d' + 
1), . . . , 2jd'. For a set / C [k] denote 



'to ifjGl, 
T] ifj^/. 



Then Corollary 3.7 implies 

(Tl - T?) . . . (Tl - Tl) = chik) (-l)l^ln,(T(i)(/), . . . , T(,)(/)). 

IC[k] 

Therefore, by triangle inequality, 

\\{T\ - T?) ... (T^ - TO)x||^ < Chik) Y l|n^(T{i)(/), • • • , T(fc)(/))x||^ (8) 

IC[k] 

<Chik)2''\\UhiT,...,T)x\\^. 

The last inequality follows since V/ n/i(T(i-)(/), . . . , T(^)(/)) is a submatrix of n/i(T, . . . , T), which 
implies that 

||n^(T(i)(/), . . . ,T(,)(/))x||^ < ||n;,(T, . . . ,T)x||, . 

The entries of the matrices (T} — Tj*), . . . , (T^„ — T^) are independent r-random variables. Thus, Theo- 
rem 3.4 (note that d' = 0{d)) yields 

Pr[3x G \\{T\ - T?) ... (T^ - T^)x||^ < Cd^] < ci exp (-czd) , 

which along with Equation (8) proves the theorem. Note that Ch{k) can be bounded as a function of k 
alone. □ 
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Combining Theorem 3.8 with Cauchy-Schwarz's inequality, we obtain a lower bound on the least sin- 
gular value of Ilh{T, . . . , T). It is well known [19, 22] that the least singular value of ad'' x n matrix with 
independent r-random entries is d''. Therefore, we get that in spite of all the correlations that exist in 
n/i(T, . . . , T) its least singular value is asymptotically the same order as that of an i.i.d. matrix. 

Theorem 3.9. Under the assumptions of Theorem 3.8 

Pr IjniUhiT, . . . , T)) < C'Vd/^] < ci exp (-cad) . 
Proof. By Cauchy-Schwarz inequality, 

||n;,(T,...,T)x||i < ^/d^||n,,(T,...,T)x||. 

Therefore, 



Pr 



3x G 5"-^ \\Tlh{T, . . . , T)x|| < 



< Pr 



3xgS"-^ ||nfc(T,...,T)x||i < C'rf 



The right-hand side probability could be bounded using Theorem 3.8 and the left-hand probabihty is exactly 



Pr 



a„(n/,(T,...,T)) <CV# 



□ 



4 Releasing M-estimators 

In this section, we analyze privacy lower bounds for releasing M-estimators. Assume we have n samples 
xi, . . . , x„ G M'^+^, consider the following optimization problem: 

1 " 

£.{e;xi,...,^n) = -y^m^i), (9) 

where 6* G 6 C M'=+\ the separable loss function £ : 6 x (M'^+I)'^ R measures the "fit" of 6* G 6 to 
any given data xi, . . . , x„, and ^ : © x — M is the loss function associated with a single data point. 

It is common to assume that the loss function has certain properties, e.g., for any given sample xi, . . . , x„, 
the loss function assigns a cost C{6; xi, . . . , x„) > to the estimator 9. The il/-estimator (9) associated 
with a given a function C{9; xi, . . . , x„) > is 

n 

§ = argmin0ge^(^;xi, . . . ,Xn) = argmin^gg ^^(6';xi). 

1=1 

For a differentiable loss function £, the estimator 9 could be found by setting 9£(0; xi, . . . , x„) to zero. 

M-estimators are natural extensions of the Maximum Likelihood Estimators (MLE) [16]. They enjoy 
similar consistency and are asymptotically normal, natural extensions of the MLE. There are several reasons 
for studying these estimators: (i) they may be more computationally efficient than the MLE, and (ii) they 
may be more robust (resistant to deviations) than MLE. The linear regression MLE is captured by setting 
£{9; x) = (y — (z, 9))'^ in Equation (9) where x = (z, y), and MLE for logistic regression is captured by 
setting £{9; x) = y{z, 9) - £{l + exp(z, 9)) [12]. 
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Problem Statement. Let D = (?7|s) be a database dimension nx d+1, and let [/ be a real- valued matrix 
of dimension n x d and s G {0, 1}" be a (secret) vector. Let D = {Sij). Consider the submatrix D\j of D, 
where J £ {1, ... ,d}''. Let 6j be the il/-estimator for D\j defined as 

n 

Oj = argmingge ^^(6*; {5ij^,. . .,5ij^,Si)) where J = (ji, . . . ,jk)- 

i=l 

The goal is to understand how much noise is needed to attribute privately release 6j for all J's when the loss 
function (i) is differentiable. 

Basic Scheme. Consider a differentiable loss function £. Let D = {U\s). Let U = ■ ■ ■ , |f^(d/fc)), 

where each [/(j) is an n x A; matrix (assume d is a multiple of k for simplicity). Consider any C/(j). We have 

1 " 

dCie; ([/(,), s)) = - Y,dm; iU(i), ,Sj)), 
^ i=i 

where is the jth row in C/(j). Then M-estimators 9i for i G [d/k] is obtained by solving 

1 " 

-Y,dm;iU(i),,sj)) = o. (10) 

^ i=i 

This gives a set of constraints over s which the adversary could use to construct s. For the case of linear and 
logistic regression, Equation (10) reduces to a form U^^s — r = 0, where r is a vector independent of s. For 
general loss function, we would use the fact that s is binary and use a decomposition similar- to Equation (1). 
The other issue is that the adversary gets only a noisy approximation of 6i, . . . ,6^/^ and we overcome this 
problem by using the Lipschitz properties of the gradient of the loss function. 

In the next subsection, we focus on the standard MLE's for linear and logistic regression. In Section 4.2, 
we consider general M-estimators. Here, we would require an additional variance condition on the loss 
function. We would use this following standai^d definition of Lipschitz continuous gradient. 

Definition 9 (Lipschitz Continuous Gradient). The gradient of a function G : R.^ ^ R is Lipschitz contin- 
uous with parameter X > if \\dG{x.) — dG{y)\\ < A||x — y||. 

Remark: Also, for any twice differentiable function G, XI >z G(x) for all x (where G(x) denotes the 
Hessian matrix) [20]. 

4.1 Releasing Linear and Logistic Regression Estimators. 

In this section, we establish distortion lower bounds for attribute privately releasing linear and logistic 
regression estimators. 

Linear Regression Background. A general linear regression problem could be represented as s = X9+e, 
where s = (si, . . . , s„) G M" is a vector of observed responses, X G M"^^ is a matrix of regressors, 
e = (ei, . . . is an unobserved eiTor vector where each Si accounts for the discrepancy between the 
actual responses (sj) and the predicted outcomes ({Xi,9)), and G M'^ is a vector of unknown estimators. 
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This above optimization problem has a closed-form solution given as ^ = (X^ X)~^X^s (also known as 
the ordinary least squares estimator). Let £iin(6'; s, X, a'^) denote the log-likelihood of linear regression (the 
likelihood expression is given in Appendix C). The gradient (w.r.t. to 6) of the log-likelihood is given by [12] 



d Ain(^; s,x,a^) = ^ (x^s - x^xe 



Ct2 



Logistic Regression Background. Logistic regression models estimate probabilities of events as func- 
tions of independent variables. Let S j be binary variable representing the value on the dependent variable for 
ith input, and the values of k independent variables for this same case be represented as Xij (j = I, . . . , k). 
Let n denote the total sample size, and let Q denote the probability of success (Pr[sj = 1]). The design 
matrix of independent variables, X = (xij), is composed of n rows and k columns. 

The logistic regression model equates the logit transform, the log-odds of the probability of a success, 
to the Unear component: 

k 

In (-^) = V = = ^^"""./l^t^'^^ where X, is the zth row in X. 

To emphasize the dependence of Q on 6, we use the notation d = Cj(^)- Let C\og{9; s, X) denote the log- 
likelihood of logistic regression (see Appendix C for the likelihood expression). The gradient (w.r.t. to 9) of 
the log-likelihood is given by [12] 

dCiog{9; s, X) = X^s - X^ven{Ci{9), C„(^)), 

where the notation vert(-, ...,•) denotes vertical concatenation of the argument matrices/vectors. Our anal- 
ysis will require a bound on the Lipschitz constant of the gradient of the log-likelihood function, which we 
bound using the following claim. 

Claim 4. The Lipschitz constant ^j^g of the gradient of the log -likelihood d Ciog{9; s, X) can be bounded 
by the operator norm of X^ X. 

Proof. From Definition 9, we know that the Lipschitz constant of the gradient of the log-likelihood {d C\og{9; s, 
can be bounded by maximum eigenvalue of Hessian of Ciog{9;s, X). The (i, j)th entry of Hessian of 

Ciog{9;s,X) is 

* ^ 1=1 

where Xa^b denote the (a, 5)th entry in X. Note that the Hessian is ak x k matrix. Since < C^(0) < 1 (as 
Ce{9) represents a probability), we have 



d^Cios{9;s,X) 



< 



* ^ 1=1 



xi,iXi^^ 



The Hessian matrix is therefore —X^X. Hence, the Lipschitz constant (Aj^g) of d Ciog{9;s, X) can be 
bounded by the operator norm of X^ X (see the remark after Definition 9). □ 
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Creating a Linear Reconstruction Problem for Linear Regression. Let U = (C/(i)|, . . . , 

where each f/^j) is an n x /c matrix (assume d is a multiple of k). Let 9i be the solution to the MLE 
equation 

The adversary gets 6i's which are a noisy approximation to 9i's. Given ^i, . . . , Od/k, the adversary solves 
the following set of linear constraints'^ to construct s: 

5£iin(6ii;s, C/(i),cr^) = • • • = dCnn{Od/k;s,U^d/k),(^'^) = 0. 
This can be re- written as: 

u^if - ul)U(,{e, = ... = c/J/,)S - = 0. (11) 

Creating a Linear Reconstruction Problem for Logistic Regression. This reduction is very similar to 
the lineal- regression case. As before let U = | , • • • , | )■ Let Oi be the solution to the MLE equation 

f^(T)S-f/(T)Vert(C(.),(e),...,C(i)„W) = 0. 

Let {6) = vert(C(i)i (0), . . . , C{j)„ The adversary gets Oi?, which are a noisy approximation to ^j's. 
Using 9i, . . . , 6d/k and U, the adversary can construct C{i){Gi), ■ ■ ■ , C{d/k){Gd/k)- The adversary then solves 
the following set of linear constraints to construct s: 

(9£iog(6'i; s, = • • • = d Ciogi0d/ki^M{d/k)) = 0. 

This can be re-written as: 

f^(T)S - f^(T)C(i)(^~i) = • • • = f/J/fc)S - Uld/k)Cid/k){Od/k) = 0. (12) 

Setting up the Least Squares and LP Decoding Attacks. Consider linear regression. The attacks operate 
on the linear reconstruction problem of Equation (11). The least squares attack constructs s (an approxima- 
tion of s) by minimizing the £2 norm of the left hand side of Equation (11), whereas LP decoding attack 
works by minimizing £1 norm. The attacks are similar for logistic regression except that they now operate 
on Equation (12). 

In the following analysis, we use a random matrix for U where each entry of the random matrix is an 
independent r-random variable (Definition 7). 

Theorem 4.1. Let d >2n and set k = 1. Then 

• Any privacy mechanism which for every database D = {U\s) where U G M"'^'^ and s € {0,1}" 
releases the estimators of the linearAogistic regression model between every column ofU and s by adding 
o{l/y/n) noise to each estimator is attribute non-private. The attack that achieves this attribute non- 
privacy violation runs in Oi^dv?) time. 

'^Equivalently, we could write the below equation as a single constraint: 

vert(i9£iin(^i;s, C/(i),o-^), . . . , 9Ain(6'd/fc; s, C/(d/fc), cr^)) = 0. 
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• There exists a constant 7 > such that any mechanism which for every database D = {U\s) where 
U G M"^*^ and s € {0, 1}" releases the estimators of the linear/logistic regression model between every 
column of U and s by adding o{l/y/n) noise to at most 1 — j fraction of the estimators is attribute 
non-private. The attack that achieves this attribute non-privacy violation runs in 0{<f"n^ + d^'^n^) time. 

Proof. We first do the analysis for linear regression. Let D = (U|s), where U is a r-random matrix of 
dimension n x d. 

Analysis for Linear Regression. Now Oi is the solution to the MLE equation: U|^^s — U|^^U(j)0 = 
where U(j) is the ith column of U. Since A; = 1, we have 6i G M. The adversary gets noisy approximations 

of 9i, ... ,9d. Let 9i, . . . ,9dbe these noisy approximations with 6i = 9i-\- Cj (for some unknown Cj). The 
adversary solves the following set of linear constraints: 

^If - US)U(i)0~i = • • • = Uj,)S - Uj,)U(,)0~, = 0. 
This could be re-written as: 

U^s - vert(Uj)U(i)0~i, . . . , \jjd)'^<^d)0d) = 0. 

Let us first look at the least squares attack. Let e = (ei, . . . , e^) (i.e., e is the error vector). We have 
U|^^U(j) < n for all i G [d\ (as U is a r-random matrix, all entries in the matrix are at most 1, and U(j) is 
the ith column in U). The least squai^es attack produces an estimate s by solving: 

argmin, ||U^s - vert(uT^U(i)^i, . . . ,\]J^^\J 
Since 9i = 9i + ei, we get for all i G [d] 

Uj)U(,)^i = Uj)U(,)^, + Uj)U(,)e, < Uj) U(,)^, + ne,. 

This implies 

U^s - vert(U[i)U(i)e~i, . . . , \]J^^\] (^^{O d) < U^s - vert(U[i)U(i)^i, . . . , Uj,)U(rf)^rf) + n • e. 

The remaining analysis is similar to Theorem 2.1, except that again each error term e, is scaled up by a 
factor (at most) n. Since U is a r-random matrix, it is well-known that if d > 2n, then the least singular 
value of U is with exponentially high probability ^}{Vd) (see, e.g., [19]). Therefore, if a privacy mechanism 
adds o(y^)/n^^ noise to each 9i, then the least squares attack with exponentially high probability recovers 
1 — 0(1) fraction of the entries in s. The time for executing this attack is 0{dv?) as the attack requires 
computing the SVD of a d x n matrix. 

The LP decoding attack produces an estimate s by solving: 

argmin, ||U^s - vert(U[i)U(i)^i, . . . , Uj^)U(rf)^d)||i. 

The analysis follows from Theorem 2.2, except that each eiTor is scaled up by a factor (at most) n. Since U 
is a r-random matrix, it also holds that with exponentially high probability U is a Euclidean section [10]. 
Therefore, if a privacy mechanism adds o{y/n)/n = o{l/y/n) noise to at most 1 — 7 fraction of the 0j's, 
then the LP decoding attack with exponentially high probability recovers 1 — o(l) fraction of the entries in 
s. The time for executing this attack is 0{d'^rv' + d^'^v?) (as there are d constraints, n variables, and the 

'^^The n in the denominator is because of the scaling of the noise e. 
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number of the bits in the input is bounded by dn). 

Analysis for Logistic Regression. Consider the estimator 9i of the logistic regression model between the 
ith column of U and s. 9i is the solution to the MLE equation: U^^s — U^^C(j)(0) = where U(,j) is the 

ith column of U. Since A; = 1, we have 9i G M. The adversary gets noisy approximations of 6i, . . . ,6^. Let 
9i, . . . ,9dhe these noisy approximations with Oi = 6i + ei. 

Using 01, . . . ,6d and U, the adversary can construct C(i)(^i)) • • • > C(d) (^d)- The adversary then solves 
the following set of linear constraints: 

%f - %)Cii){Oi) = ■■■ = vj,f - vJ,)Cid)m = 0. 

Let us now apply Lipschitz condition on the gradient of the log-likelihood function. Note that since k = 1, 
d C\og{6i; s, U(j)) is a scalar variable (for every i € [d]). By Lipschitz condition, 

|9£iog(^i;s,U(j)) - 5Aog(<9i;s,U(j))| < Aiog l^'i - =Aiog|ei|, 

where Aj^g = U|'J^U(j) < n (using Claim 4). Therefore, 

9Aog(6'i;s,U(i)) < n\ei\ + (9Aog(^i; s, U(j)). 

Substituting d Ciog{9i] s, U(i)) = U^^^s - \jJ^C(i){Oi) and 9 Aog(^i; s, U(i)) = Uj^s - \jJ^C{i){di) in the 
above equation, 

Uj)S - Uf)C(.)(e~.) < Uj)S - ^l)C(^){0^) + n|e,|. 

This implies 

U^s - vert(UT^C(i)(^~i), • • • , ^Jd)C(,i){0 d)) < U"^s - vert(U[,)C(i)(^i), • • • , Uj,)C(d)(^d)) + • |e|. 
The least squares attack produces an estimate s by solving: 

argmin, HU'^s - vert(U[i)C(i)(^~i), ■ ■ . , Uj,)C(rf)(^d))||. 

The remaining analysis follows as in the linear regression case (from Theorem 2.1) and again the errors are 
scaled by a factor (at most) n. Therefore, if a privacy mechanism adds o{^/n)/n = o{l/^/n) noise to each 
9i, then the least squares attack with exponentially high probability recovers 1 — o(l) fraction of the entries 
in s. The time for executing this attack is 0{d'n?). 

The LP decoding attack produces an estimate s by solving: 

argmin, HU'^s - vert(U[i)C{i)(^~i), ■ ■ . ,\jJ,)Qid){0d))\\i. 

Again like in the linear regression case (using Theorem 2.2), we get that any mechanism that adds o{^/n) /n = 
0(1/ y/n) noise to at most 1 — 7 fraction of the ^j's is attribute non-private. The time for executing this attack 

is Oid^n? + d^-^n^). □ 

Compared to Theorems 3.2 and 3.3, this above theorem requires a much larger d (about 0(n)). How- 
ever, it is possible to reduce to dependence on d rP'^^/^'^ if the released statistic is a degree k polynomial 
kernel of these regression functions. We defer the details to the full version. 
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4.2 Releasing General M-estimators. 



In this section, we establish distortion lower bounds for attribute privately releasing M-estimators associated 
with differentiable loss functions. 



Creating a Linear Reconstruction Problem. Consider Equation (10). Since s is a binary vector, we can 

decompose dl{9\ {U(^i-j. , sj)) G M'^ as follows: 

am (%^,, s,)) = m )(i - sj) + h{e; u^,)^)sj = io{0; + ih{0; - m 

where £o(^;J7{i),) = d i{9; {U^,)^,0)) G M'^ and = (C/(i)^. , 1)) G This is similar to 

the decomposition in Equation (1). Let 

Now the If -estimator (O-i) between C/(j) and s can be found by setting d C{9; {U(^i), s)) = 0. Therefore, 6i is 
the solution to (ignoring the scaling multiplier 1 /n) 

n 

d C{9- s)) = Y^dm- , s,)) = 

n 

= ^4(^;t/(i),)+^2(^;C/w,)sj=o. 

The adversary gets ^j's which are a noisy approximation to 0j's. Given ^i, . . . , O^jk, the adversary solves 
the following set of linear constraints: 

am;(C/(i),s)) = ... = 5£(^~d/fc;([/(rf/fc),s)) = (13) 

^^£o(^i;C^(i),) + ^2(^i;C/(i),)sij =•••= |^X^4(V;f^Wfc).) + ^2(V;f^Wfc).)^ij =0- 

This could also be represented in a matrix-form as we show below. For every i G [d/k], define as a 
/c X ?i matrix whose jth column is ^o(^i; ) and as a /c x n matrix whose jth column is ^2(^1; 
Then Equation (13) can be re- written as 

+ B(i)S = ■■■ = A(^d/k)'^n + B(^a/k)^ = 0. 
The adversary solves the above equation to obtain s. The analysis uses the following condition on £. 

Definition 10 (Variance Condition). Consider the decomposition of the gradient of a loss function, t : 
exM^+i ^ Ma5 5£(6l;(x,y)) = £0(6*; x) + ^2(6*; x)y w/zere (9 G M^,x G M'',y G {0,1}. The loss fiinction 
i is said to be satisfy the variance condition if for every 9, Var-x\l2{9] x)] is bounded away from zero. 



'*For the LP decoding attack, we need a stricter condition to achieve the guarantees of Theorem 2.2. We defer this discussion to 
the full version. 
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Theorem 4.2. Let i be a dijferentiable loss function which satisfies the variance condition. Let A denote 
the Lipschitz constant of the gradient of the loss function £. Let d > 2n and set k = 1. Then any privacy 
mechanism which for every database D = {U\s) where U G M"^'^ and s S {0,1}" releases the Ad- 
estimators associated with the loss function I of the models fitted between every column ofU and s by adding 
o{\/{y/n\)) noise to each M-estimator is attribute non-private. The attack that achieves this attribute non- 
privacy violation runs in 0{din?) time. 

Proof. Let D = (U|s), where U is a r-random matrix of dimension n x d . 

The M -estimator 9i E M between the ith column U(j) of U and s is given by the solution to the equation: 

n 

The adversary gets noisy approximations of 6i, ... ,9ii. Let ^i, . . . , be these noisy approximations with 
Oi = 6i-\- Cj. Consider d C{6i; (U(j),s)) and d C{6i; (U(j),s)). By Lipschitz conditiorf^, 

|9£(6ii;(U(i),s)) -9£(^,;(U(i),s))| < AlO^ - 0i\ < Xn\6i - Oil = Xn\ei\. 
This implies that 

d/:0i; (U(,),s)) < dC{9,; (U(,),s)) + Xn\ei\. 
Substituting for the decomposition of i in the above equation gives 

n n 

J^4(e~i;U(,)^,) +^2(e~i;U(i)^.)sj < 5^4(^i;U(,)^.) +^2(^*;U(,)^.)5i + An|e,|. (14) 

3=1 j=l 

Let Ai, B2, A2, B2 be four matrices of dimension d x n defined as follows: 

Ai : ithrowof Aiis4(^i;U(j)J,...,^o(^i;U(i),J, 

Bi : ithrowofBiis^2(^*;U(,)J,...,£2(^*;U(i)J, 

A2 : ithrowof A2is4(^i;U(i)J,...,4(6ii;U(i),J, 

B2 : ithrowofB2is£2(^i;U(,)J,...,^2(^i;U(i)J. 

The adversary solves the following reconstruction problem to compute s: 

A2l„ + B2S = 0. (15) 

From Equation (14) it follows that 

A2ln + B2S < Ail„ + Bis + \n\e\. 

B2: Least Singular Value. Since U is a r-random matrix, B2 is another random matrix. However, B2 
may not be centered (i.e., its entries might have non-zero means). We can re-express B2 as 

B2 = B2-E[B2] +E[B2]. 
R 



'^The Lipschitz constant A of £ is at most n times the Lipschitz constant of I, and therefore A < n\. 
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Here, R is a r'-random matrix'^ and IE[B2] is a matrix of the form uv^ where u is a d-dimensional column 
vector with entries Kx[i2{()i', x)], . . . , E,x[i2{()d', x)] and v is a n-dimensional column vector of all ones. 

Claims. Pr[(7„(B2) < cigVd] < exp{-C2od). 

Proof. B2 = R + uv^, where R is a r'-random matrix. The rank of uv^ is 1, and its operator norm can 
be polynomially bounded in d. Applying Lemma D.2 implies the result. □ 

Least Squares Attack. The least squares attack produces an estimate s by solving: 

s = argmiUg ||A2ln + B2s||. 

The analysis is similar to Theorem 2.1, except that each error term Si is scaled up by a factor (at most) An. 
Therefore, if a privacy mechanism adds o{l/{^/nX)) noise to each 9i, then the least squares attack with 
exponentially high probability recovers 1 — o(l) fraction of the entries in s. The time for executing this 
attack is 0{dn^). 

□ 
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A Preliminaries about Boolean Functions 

We start with an alternate definition of non-degeneracy and show that it is equivalent to Definition 4. 
Definition 11. A boolean function f : {0, 1}'^+^ — t- {0, 1} is called non-degenerate if 



Any function on the discrete cube {—1, 1}''+^ can be decomposed into a linear combination of charac- 
ters, which are Walsh functions. Such representation allows to extend the function / from the discrete cube 
to M'^"'"^ as a multilinear (i.e., linear with respect to each variable separately) polynomial. In what follows, 
we will use this extension. The following lemma shows that the non-degeneracy condition is equivalent to 
the fact that this polynomial has the maximal degree. 





(<5i,...,<5fe+i)e{0,l}''+ 



24 



Lemma A.l. Definitions 11 and 4 are equivalent. 
Proof. Consider the function g : { — 1,1}'^+^ — )• {—1,1} defined by 

g{(t>i,- ■ -Ak+i) = 2/ Q(i + 0i),. . . , ^{i + (j)k+i] 

Let (/) = ((^i) • • • ) (f'k+i)- For S C {1, . . . , + 1} let xsi'P) = Yij^s corresponding Walsh 

function. Then the function g can be decomposed as 

9{<l^) = E 9{S)xs{cP). 
sc{i,...,k+i} 

Note that deg(/) = deg{g), and so deg(/) = k + l iff ?(1, . . . , + 1) 7^ 0. We have 

fc+i 

5(1,..., fc + 1) = 2-^-1 J2 9{^)Yl^r (16) 

0G{-i,i}'=+i i=i 



Since 

k+l 

g{(j)) = and JJ = (-l)^'^S^i, 

where 5^ = (1 — (jij)/^, the lemma follows. □ 

Remark: There are 10 non-degenerate functions of two boolean variables: 

AND: 5i A (52, 5i A (^^z), (-5i) A 82, (-5i) A (^52), 
OR: <5i V 82, 81 V {^82), {-^81) V (52, (-(5i) V (^(^2), 
XOR:(5i©(52, Ml) ©(-(52). 

The remaining 6 boolean functions of two variables are degenerate. 

B Euclidean Section Property of Random Row Function Matrices 

In this section, we establish the Euclidean section property needed for Theorem 3.3 (LP decoding attack). 
Let us consider a function h : {—1, l}'^ — )• { — 1, 0, 1} having a representation as a multilinear polyno- 
mial of degree k. Let P^'^'^ denote this multilinear polynomial. We first prove a proposition analogous to 
Proposition 3.6 for h. For / C [k], let us define 



Pf ^(,^1, . . . , </>fc) = PC^) (</>;, . . . , </.;) where cj^'^ = cj^i ii i e /, else </.^ = 0. 



That is Pj'^\(j)i, . . . is the multilinear polynomial Ph{4>i, . . . ,4>k) restricted to variables only in /. 
Under this notation 

(</)!, ,/.2,...,</'fc)=P('^)(</'l, 02,..., ./-fc). 
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Proposition B.l. Let h be a function from {—1,1}'^ — )■ {— 1,0,1} having a representation as a multilinear 
polynomial of degree k. Let P^^^ be this multilinear polynomial. Let {(pi, . . . , (pk) £ ^}^- Then 



IC[k] 

where l/ch{k) is the coefficient of the monomial corresponding to all k variables in P^^\ 
Proof. The proof is similai^ to Proposition 3.6. Since P^'^^ is a linear function in cpi, 

d(pi 

where -^p^P^^^ is the partial derivative of P^'^\(pi, . . . ,(pk) with respect to (pi. Repeating this for other 
coordinates, we get 

When I =[k], 



h{^i,...,^k) = P]t\^i,...Ak), 



so the above equation could be re-expressed as 

^k] V5<^i dcpu J 

The last term in the right hand side is the coefficient of the polynomial P^^\(pi, . . . ,(pk) corresponding to 
the monomial cpi ■ . . . ■ (pk, and we denote it by l/ch{k). □ 

Let V(i) = {cp[^j), . . . , V(fc) = {(pi'j) be k matrices with {—1, 1} entries and dimensions d x n. Let us 
define &d^ xn matrix lip(h) (^(i)i • • • > ^(fc)) in Definition 5 using the multilinear polynomial pf'\ More 
concretely, for J = (j'l, j2, ■ ■ ■ ,jk) G {!)••• i d}^ the (J, a) entry of the matrix n^(h) . . . , V(fc)) will 
be defined by the relation 

Using this definition, the following corollary follows immediately from Proposition B.l. 
Corollary B.2. Let V(i), . . . , , V(^k) be k matrices with {—1, 1} entries and dimensions d x n. Then 

V(^i)Q...QV(^k) = ch{k) J](-i)^-i^in^(.)(i^(i),...,i^(fc)) + n;,(y(i),...,i^(fc)) I . 

V/c[fc] ' / 
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Theorem B.3. Let k, n, d be natural numbers. Consider a d x n matrix V with independent Bernoulli 
entries taking values —1 and 1 with probability 1/2. Let h be a function from { — 1,1}*^ — t- { — 1,0,1} 
having a representation as a multilinear polynomial of degree k. Then the matrix n/j(V, . . . , V) satisfies 



Pr 



|n,,(v,...,v)|| >C6 Vd'= + ^/^ 



< cy exp ( — cg ( ni2fc 



The constants cqjCj, Cs depend only on k. 
Proof. From Corollary B.2, we know that 

n,(v, . . . , V) = n^(.) (V, . . . , V) = ^®-;,f ^ - (-i)'^'"np(.) (v, , 



(17) 



To prove the theorem, we use induction over the size of /. Our inductive claim is that for every / C [k], 



Pr 



n^^t;.) (V, . . . , V) > C6 + Vn^ < C7 exp (^-cgn W 



where constants cg, cy, cg depend only on |/|. 
Step 1. Let |/| = 1. Then 



np(.)(v,...,v) = c9V 



for some constant cg. Therefore, 



np(.)(v,...,v) 



C9 l|v| 



Since V is a random {—1, 1} matrix, it is well known that (see e.g., [22]) there exists constant cio, cn such 
that 

Pr[||V|| > Clo{^/d+^/^)] < exp(-ciin). 



Therefore, 



Pr 



np(h)(V, . . . , V) > CQCio{Vd + ^/n) < exp (-Clin). 



Therefore, there exists constants cq,cj, cg such that 



Pr 



np(.)(v,...,v: 



> Cg ( Vd + y/n 



< cy exp (— cgn) < C7 exp ( — cgn 



This completes the basis for induction. 

Step 2. Let |/| = k, i.e., / = {1, . . . , A;} = [k]. We want to bound 11 (fe)(V, . . . , V) 
hypothesis, we assume that for every L C [k], 



Pr 



np(.)(v,...,v) 



< C7 exp ( — Cg ( n^^li-l 



By inductive 



(18) 



where constants cq,C7,cs depend only on \L\. From Theorem 3.5, we know that the fc-times entry wise 
product V • • • V satisfies the following norm condition (as V is matrix with independent r-random 
entries) 



Pr 



|V0---0V|| >C3 



(Vd^- 



< exp I — C4re 



(19) 
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where again the constants C3, C4 depend only on k. From Equations (18) and (19), we get that there exists 
constants cg , C7 , cg (depending only on k) such that 

Pr[ ||n^{h)(V,...,V)|| + ||V0---0V|| > C6 [Vd^ + ^/nj] < C7 exp (-cgnTS^ ) . (20) 

Lc[fc] ^ 

From Equation (17), 



Pr 



I V ... V - ^ (_i)fc-|iin^(,) (V, . . . , V) II > ce{V¥ + ^) 

L<Z{k] ^ 



Pr 



< C7 exp Cg yn^'- 

The last inequality comes by applying triangle inequality and then using from Equation (20). Note that for 
/ = [k], Yi (h) (V, . . . , V) = ||n,,(V, . . . , V) II . This completes the proof of the theorem. □ 

Theorem B.4. Let k, q, n, d be natural numbers. Assume that n < cd^ / log(g) d. Consider a d x n matrix 
V with independent Bernoulli entries taking values — 1 and 1 with probability 1 /2. Let h be a function 
{ — 1, l}'^ —7- { — 1,0, 1} having a representation as a multilinear polynomial of degree k. Then the matrix 
n/i(V, . . . , V) is with exponentially high probability a Euclidean section. 

Proof. Firstly note that Ilh(V, . . . , V) is an operator from M" — )• M.'^^ . As mentioned before, by Cauchy- 
Schwarz's inequality 

Vxg5"-i V¥\\Uhiv,...,v)jc\\^ > ||n;,(v,...,v)x||i. 

Note that the proof idea of Theorem 3.8 also works for h. That is if n < cd'^/ log(q) d then 
3xg5""1 ||n/,(V,...,V)x||i < C'dH < ci exp (-C2(i) . 



Pr 



In other words. 



Pr 



Vxg5"-^ \\Uh(V,...,Y)x\\-^>C'd'' > 1 - ci exp (-Cad) 



Theorem B.3 implies that 



Pr[Vxe5'^-l ||n;,(V,...,V)x||2 <C6 V#' + ^/^ ] 



> 1 — C7 exp I — Cg I n 



If n < cd^ / log(g) d, then there exists a constant C13 (depending only on k) such that 
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VxG ||n;,(v,...,v)x||2 < 



> 1 — C7 exp ( — Cg ( n 



Therefore, with probability at least 1 — ci exp (— cad) — C7 exp y—cg yn i2fe j j , there exists a a (depending 
only on k and q) such that 

Vx G 5"-^ ||n?,(V,...,V)x||i > aV¥ ||nft(V,...,V)x||2. 
This shows that the matrix n/i(V, . . . , V) is with exponentially high probability a Euclidean section. □ 
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C MLE's for Linear and Logistic Regression 



Consider the linear regression problem s = X9 + e. The likelihood function for linear regression (under the 
assumption that the entries in s ai^e normally distributed) is: 

where Xi is the ith row in X. The log-likelihood is: 

2 2 2 \ J 

For the logistic regression problem, the likelihood function (under the assumption that the entries in s 
are binomially distributed) is: 

The log-likelihood is: 

n 

£iog(^; s, X) = ^ s^{Xi,e) - ln(l + exp((Xi, 6))). 

i=l 



D Least Singular Value of Perturbed Random Matrices 

In this section, we bound the least singular value of a random matrix perturbed by a low rank matrix. We 
need the following fact. 

Lemma D.l. Let Rbe a dxn random matrix with independent centered subgaussian entries with variances 
uniformly bounded below (t -random entries fall into this category) and with d > Cun. For any z G W^, 



Pr 



3x G 5" ^ ||Rx + z|| < ci5\/d < exp(-ci6d). 



The constants C14, C15, and ciq are all independent ofn and d. 

For z = this follows from Proposition 2.5 [19] and the standard estimate of the norm of a subgaussian 
matrix (see e.g.. Proposition 2.3 [18]). The proof for a general z follows the same lines. 

The lemma below gives an estimate of the smallest singular value of a perturbed random matrix. 

Lemma D.2. Let Rbe a dxn random matrix with independent centered subgaussian entries with variances 
uniformly bounded below and d > ci4_n. Let D be a deterministic dxn matrix such that rank{D) = 
K, \\D\\ < d"", where a > is a constant. If 

[KKcisd if a < 1/2. 
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then 

Pr[cr„,(R + I?) < cigVd] < exp(-C2orf). 
The constants cn, cig, cig, and C20 are all independent ofn and d. 

Proof. Set Q = DB^ (where denotes the unit EucUdean ball in M"), and let e = ci5y/d/2. By the 
volumetric estimate, there exists an e-net A/" in Q of cardinality at most 

for a > 1/2 and |7V| < for a < 1/2. Assume that there exists x G S"-~^ such that ||(R + Z))x|| < 
ci5\/d/2. Choose z G A/" so that ||Z)x — z|| < e. Then 

||Rx + z|| < ||(R + L»)x|| + ||z-L»x|| 



Lemma D.l and the union bound yield 

Pr[3xG5"-i ||(R + Z))x||2 < ci5\/^/2] 

< Prpx G G M ||Rx + zjlg < 015^] 

< \J\f \ ■ exp(-ci6C?) 

< exp{-C2od). 

The last inequality follows from the assumption on K. Here, C21 and C22 are constants independent of n 
and d. □ 
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